Event-Learning with a Non-Markovian Controller

نویسندگان

  • István Szita
  • András Lörincz
چکیده

Recently a novel reinforcement learning algorithm called event-learning or E-learning was introduced. The algorithm based on events, which are defined as ordered pairs of states. In this setting, the agent optimizes the selection of desired sub-goals by a traditional value-policy function iteration, and utilizes a separated algorithm called the controller to achieve these goals. The advantage of event-learning lies in its potential in non-stationary environments, where the near-optimality of the value iteration is guaranteed by the generalized ε-stationary MDP model. Using a particular nonMarkovian controller, the SDS controller, an ε-MDP problem arises in E-learning. We illustrate the properties of E-learning augmented by the SDS controller by computer simulations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reinforcement Learning in Markovian and Non-Markovian Environments

This work addresses three problems with reinforcement learning and adap-tive neuro-control: 1. Non-Markovian interfaces between learner and environment. 2. On-line learning based on system realization. 3. Vector-valued adaptive critics. An algorithm is described which is based on system realization and on two interacting fully recurrent continually running networks which may learn in parallel. ...

متن کامل

Noisy K Best-Paths for Approximate Dynamic Programming with Application to Portfolio Optimization

We describe a general method to transform a non-Markovian sequential decision problem into a supervised learning problem using a K-bestpaths algorithm. We consider an application in financial portfolio management where we can train a controller to directly optimize a Sharpe Ratio (or other risk-averse non-additive) utility function. We illustrate the approach by demonstrating experimental resul...

متن کامل

Auxiliary Gibbs Sampling for Inference in Piecewise-Constant Conditional Intensity Models

A piecewise-constant conditional intensity model (PCIM) is a non-Markovian model of temporal stochastic dependencies in continuoustime event streams. It allows efficient learning and forecasting given complete trajectories. However, no general inference algorithm has been developed for PCIMs. We propose an effective and efficient auxiliary Gibbs sampler for inference in PCIM, based on the idea ...

متن کامل

New Approach to Exponential Stability Analysis and Stabilization for Delayed T-S Fuzzy Markovian Jump Systems

This paper is concerned with delay-dependent exponential stability analysis and stabilization for continuous-time T-S fuzzy Markovian jump systems with mode-dependent time-varying delay. By constructing a novel Lyapunov-Krasovskii functional and utilizing some advanced techniques, less conservative conditions are presented to guarantee the closed-loop system is mean-square exponentially stable....

متن کامل

Human learning in non-Markovian decision making

Humans can learn under a wide variety of feedback conditions. Particularly important types of learning fall under the category of reinforcement learning (RL) where a series of decisions must be made and a sparse feedback signal is obtained. Computational and behavioral studies of RL have focused mainly on Markovian decision processes (MDPs), where the next state and reward depends only on the c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002